Integrating Articulatory Features into Acoustic Models for Speech Recognition
نویسنده
چکیده
It is often assumed that acoustic-phonetic or articulatory features can be beneficial for automatic speech recognition (ASR), e.g. because of their supposedly greater noise robustness or because they provide a more convenient interface to higher-level components of ASR systems such as pronunciation modeling. However, the success of these features when used as an alternative to standard acoustic speech signal representations (e.g. MFCCs) has so far been demonstrated only for limited domains, such as phone recognition or smallvocabulary speech recognition. On more challenging tasks, e.g. large-vocabulary speech recognition, standard acoustic features have consistently shown a superior performance. This study compares the performance of standard acoustics-based systems to that of articulatory feature-based systems on medium to large vocabulary recognition tasks. Results suggest that, for an optimal recognition performance, it is more advantageous to selectively combine information from both acoustic and articulatory representations than it is to use an articulatory feature-based representation alone. Data-driven techniques are applied to determine what kind of information articulatory features can contribute in addition to standard acoustic speech features.
منابع مشابه
Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition
Studies have shown that articulatory information helps model speech variability and, consequently, improves speech recognition performance. But learning speaker-invariant articulatory models is challenging, as speaker-specific signatures in both the articulatory and acoustic space increase complexity of speech-to-articulatory mapping, which is already an ill-posed problem due to its inherent no...
متن کاملArticulatory features for "meeting" speech recognition
“Meeting” speech, for example from the RT-04S task, contains a mixture of different speaking styles that leads to word error rates higher than 25% even when close-talking microphones are being used. The problem is even more serious, as word error rates are particularly high when speakers use a clear speaking mode, for example because they want to stress an important point. Previous work showed ...
متن کاملIntegration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework
Most of the current state-of-the-art speech recognition systems are based on speech signal parametrizations that crudely model the behavior of the human auditory system. However, little or no use is usually made of the knowledge on the human speech production system. A data-driven statistical approach to incorporate this knowledge into ASR would require a substantial amount of data, which are n...
متن کاملAcoustic feature combination for speech recognition
In this thesis, the use of multiple acoustic features of the speech signal is considered for speech recognition. The goals of this thesis are twofold: on the one hand, new acoustic features are developed, on the other hand, feature combination methods are investigated in order to find an effective integration of the newly developed features into state-of-the-art speech recognition systems. The ...
متن کاملCombining acoustic and articulatory feature information for robust speech recognition
The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...
متن کامل